3 research outputs found

    Deep learning and bidirectional optical flow based viewport predictions for 360° video coding

    Get PDF
    The rapid development of virtual reality applications continues to urge better compression of 360° videos owing to the large volume of content. These videos are typically converted to 2-D formats using various projection techniques in order to benefit from ad-hoc coding tools designed to support conventional 2-D video compression. Although recently emerged video coding standard, Versatile Video Coding (VVC) introduces 360° video specific coding tools, it fails to prioritize the user observed regions in 360° videos, represented by the rectilinear images called the viewports. This leads to the encoding of redundant regions in the video frames, escalating the bit rate cost of the videos. In response to this issue, this paper proposes a novel 360° video coding framework for VVC which exploits user observed viewport information to alleviate pixel redundancy in 360° videos. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360° video frames, a novel Regions of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the Lagrange and quantization parameters in VVC. Comprehensive experiments conducted in the context of VVC Test Model (VTM) 7.0 show that the proposed framework can improve bitrate reduction, achieving an average bitrate saving of 5.85% and up to 17.15% at the same perceptual quality which is measured using Viewport Peak Signal-To-Noise Ratio (VPSNR)

    QoE Aware VVC Based Omnidirectional and Screen Content Coding

    No full text
    "Widespread adoption of immersive media and communication tools, including Virtual Reality (VR), screen sharing and video conferencing applications, demand better compression of non-conventional video contents. Versatile Video Coding (VVC), the latest video coding standard which focusses on versatility introduces new coding toolsspecifically for the omnidirectional videos (360° videos) and artificially generated video (screen) contents. The special characteristics demonstrated by these non-conventional videos pose crucial challenges in developing efficient encoding algorithms. Moreover, VVC and state-of-the-art compression architectures provide inadequate coding support to address the spherical and the perceptual characteristics and exploit distinct features,including sharp edges and repeating patterns in 360° videos and screen content videos, respectively.In response, the first contribution introduces spherical characteristics to VVC to support its rectilinear functionalities. To this end, a novel spherical objective metric called the Weighted Craster Parabolic Peak-To-Signal Ratio (WCPPPSNR) is developed and used with newly designed residual weighting and multiple QuantizationParameter (QP) optimization techniques to improve the compression efficiency. This not only brings spherical characteristics to the video codec but acts as two-stage magnitude reduction of redundancy in both spatial and frequency domains. The results report that the proposed algorithms can improve the compression efficiency of VVC Test Model (VTM) 2.2 by 3.18% on average and up to 6.07%.The second contribution of the thesis proposes a novel 360° encoding that leverages user observed viewport information. In this regard, bidirectional optical flow, Gaussian filter and Spherical Convolutional Neural Networks (Spherical CNN) are deployed to extract perceptual features and predict the user observed viewports. By appropriately fusing the predicted viewports on the 2-D projected 360° video frames, a novel Regions Of Interest (ROI) aware weightmap is developed which can be used to mask the source video and introduce adaptive changes to the VVC coding tools. Comprehensive experiments conducted in the context of VTM 7.0 show that the proposed scheme can improve perceptual quality and reduce bitrates, achieving an average bitrate saving of5.85% and up to 17.15% for perceptual quality measurements.The final contribution introduces two affine prediction techniques that can extend the functionality of Intra Block Copy (IBC) and exploit the geometrical transformations between objects and characters in screen content videos. The first technique applies a Control Point Vector (CPV) search mechanism that allows search for more affinetransformed IBC blocks in a conventional manner which is identical to the motion estimation in inter blocks. In contrast, the second technique employs a parameter-based approach by predefining suitable affine transformations parameters that are applied on the IBC reference samples and compacting the information necessary to represent these transformations. In the context of VVC standard, the proposed techniques outperform the reference implementations and other state-of-the-art schemes, achieving consistent coding gains and up to 5.41% for specific screen content sequences.Finally, the proposed contributions have also been examined to test their performance in error prone networking environment by developing a transmission based Quality Of Experience (QoE) model from the VVC dependent parameters. The result shows superior gains in the QoE over the anchor implementations of the respective contributions.
    corecore